AITopics | egyptian arabic

Collaborating Authors

egyptian arabic

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Iterative Layer Pruning for Efficient Translation Inference

Moslem, Yasmin, Farouq, Muhammad Hazim Al, Kelleher, John D.

arXiv.org Artificial IntelligenceOct-28-2025

Large language models (LLMs) have transformed many areas of natural language processing, including machine translation. However, efficient deployment of LLMs remains challenging due to their intensive computational requirements. In this paper, we address this challenge and present our submissions to the Model Compression track at the Conference on Machine Translation (WMT 2025). In our experiments, we investigate iterative layer pruning guided by layer importance analysis. We evaluate this method using the Aya-Expanse-8B model for translation from Czech to German, and from English to Egyptian Arabic. Our approach achieves substantial reductions in model size and inference time, while maintaining the translation quality of the baseline models.

large language model, natural language, translation quality, (13 more...)

arXiv.org Artificial Intelligence

doi: 10.18653/v1/2025.wmt-1.78

2510.22763

Country:

Europe > Ireland > Leinster > County Dublin > Dublin (0.15)
North America > United States > Pennsylvania (0.14)

Genre: Research Report (0.84)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

ArzEn-MultiGenre: An aligned parallel dataset of Egyptian Arabic song lyrics, novels, and subtitles, with English translations

Al-Sabbagh, Rania

arXiv.org Artificial IntelligenceAug-5-2025

This is an open access article under the CC BY license ( http://creativecommons.org/licenses/by/4.0/) 2 R. Al-Sabbagh / Data in Brief 54 (2024) 1 10271 Subject Computer Science, Social Sciences Specific subject area Natural Language Processing, machine translation, large-language models, translation studies, cross-linguistic analysis, lexical semantics Data format Translated and aligned Type of data Texts (Bilingual tables in Microsoft Excel files) Data collection The ArzEn-MultiGenre dataset consists of three genres: song lyrics, novels, and subtitles. The data was gathered from various sources using different methods. A website was crawled for song lyrics using an in-house web crawler, and professional translators manually translated the lyrics into English. For novels, hard copies were collected in English and Egyptian Arabic, then scanned and converted into text files using an Optical Character Recognizer (OCR). The OCR output was then manually reviewed and aligned.

artificial intelligence, machine translation, natural language, (16 more...)

arXiv.org Artificial Intelligence

doi: 10.1016/j.dib.2024.110271

2508.01411

Country: Asia > Middle East > UAE (0.14)

Genre: Research Report (0.64)

Industry:

Leisure & Entertainment (1.00)
Media > Music (0.93)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Add feedback

Nile-Chat: Egyptian Language Models for Arabic and Latin Scripts

Shang, Guokan, Abdine, Hadi, Chamma, Ahmad, Mohamed, Amr, Anwar, Mohamed, Bounhar, Abdelaziz, Herraoui, Omar El, Nakov, Preslav, Vazirgiannis, Michalis, Xing, Eric

arXiv.org Artificial IntelligenceJul-8-2025

We introduce Nile-Chat-4B, 3x4B-A6B, and 12B, a collection of LLMs for Egyptian dialect, uniquely designed to understand and generate texts written in both Arabic and Latin scripts. Specifically, with Nile-Chat-3x4B-A6B, we introduce a novel language adaptation approach by leveraging the Branch-Train-MiX strategy to merge script-specialized experts, into a single MoE model. Our Nile-Chat models significantly outperform leading multilingual and Arabic LLMs, such as LLaMa, Jais, and ALLaM, on our newly introduced Egyptian evaluation benchmarks, which span both understanding and generative tasks. Notably, our 12B model yields a 14.4% performance gain over Qwen2.5-14B-Instruct on Latin-script benchmarks. All our resources are publicly available. We believe this work presents a comprehensive methodology for adapting LLMs to dual-script languages, addressing an often overlooked aspect in modern LLM development.

arxiv preprint arxiv, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2507.04569

Country:

Africa > Middle East > Egypt (0.14)
Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
(7 more...)

Genre: Research Report (0.82)

Industry: Education > Educational Setting (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Mufu: Multilingual Fused Learning for Low-Resource Translation with LLM

Lim, Zheng Wei, Gupta, Nitish, Yu, Honglin, Cohn, Trevor

arXiv.org Artificial IntelligenceSep-20-2024

Multilingual large language models (LLMs) are great translators, but this is largely limited to high-resource languages. For many LLMs, translating in and out of lowresource languages remains a challenging task. To maximize data e ciency in this low-resource setting, we introduce Mufu, which includes a selection of automatically generated multilingual candidates and an instruction to correct inaccurate translations in the prompt. Mufu prompts turn a translation task into a postediting one, and seek to harness the LLM's reasoning capability with auxiliary translation candidates, from which the model is required to assess the input quality, align the semantics cross-lingually, copy from relevant inputs and override instances that are incorrect. Our experiments on En-XX translations over the Flores-200 dataset show LLMs finetuned against Mufu-style prompts are robust to poor quality auxiliary translation candidates, achieving performance superior to NLLB 1.3B distilled model in 64% of low-and very-low-resource language pairs. We then distill these models to reduce inference cost, while maintaining on average 3.1 chrF improvement over finetune-only baseline in low-resource translations. This performance gap is caused primarily by scant pre-training data in these languages (Wei et al., 2023; Yuan et al., 2024; Alves et al., 2024), and is di cult to overcome despite growing e orts to support translations of long-tail languages (Kudugunta et al., 2024; Bapna et al., 2022; Lu et al., 2024). In this work, we introduce multilingual fused learning (Mufu), which combines multilingual context and a postediting task when translating into lower-resource languages using LLMs.1 Mufu-style prompts (see Table 1, top block) include several multilingual translation candidates along with a postediting target, from which a model learns "in-context" to translate from languages with which the target language is more closely aligned due to cultural relevance, geographical and genealogical proximity. We rely on a larger, more competent multilingual teacher model to generate auxiliary translations in these languages, which help disambiguate inputs and improve cross-lingual semantic alignment in a translation task.

arabic, egyptian arabic, translation, (15 more...)

arXiv.org Artificial Intelligence

2409.13949

Country:

Africa > Kenya (0.06)
North America > United States (0.06)
Asia > Myanmar (0.06)
(5 more...)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.67)
Health & Medicine > Therapeutic Area > Immunology (0.46)
Health & Medicine > Therapeutic Area > Pulmonary/Respiratory Diseases (0.45)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

ArzEn-LLM: Code-Switched Egyptian Arabic-English Translation and Speech Recognition Using LLMs

Heakl, Ahmed, Zaghloul, Youssef, Ali, Mennatullah, Hossam, Rania, Gomaa, Walid

arXiv.org Artificial IntelligenceJul-12-2024

Motivated by the widespread increase in the phenomenon of code-switching between Egyptian Arabic and English in recent times, this paper explores the intricacies of machine translation (MT) and automatic speech recognition (ASR) systems, focusing on translating code-switched Egyptian Arabic-English to either English or Egyptian Arabic. Our goal is to present the methodologies employed in developing these systems, utilizing large language models such as LLama and Gemma. In the field of ASR, we explore the utilization of the Whisper model for code-switched Egyptian Arabic recognition, detailing our experimental procedures including data preprocessing and training techniques. Through the implementation of a consecutive speech-to-text translation system that integrates ASR with MT, we aim to overcome challenges posed by limited resources and the unique characteristics of the Egyptian Arabic dialect. Evaluation against established metrics showcases promising results, with our methodologies yielding a significant improvement of $56\%$ in English translation over the state-of-the-art and $9.3\%$ in Arabic translation. Since code-switching is deeply inherent in spoken languages, it is crucial that ASR systems can effectively handle this phenomenon. This capability is crucial for enabling seamless interaction in various domains, including business negotiations, cultural exchanges, and academic discourse. Our models and code are available as open-source resources. Code: \url{http://github.com/ahmedheakl/arazn-llm}}, Models: \url{http://huggingface.co/collections/ahmedheakl/arazn-llm-662ceaf12777656607b9524e}.

egyptian arabic, machine translation, translation, (11 more...)

arXiv.org Artificial Intelligence

2406.1812

Country:

Africa > Middle East > Egypt > Alexandria Governorate > Alexandria (0.04)
North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.04)
(3 more...)

Genre: Research Report > New Finding (0.46)

Industry: Education (0.48)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.73)

Add feedback

DIALECTBENCH: A NLP Benchmark for Dialects, Varieties, and Closely-Related Languages

Faisal, Fahim, Ahia, Orevaoghene, Srivastava, Aarohi, Ahuja, Kabir, Chiang, David, Tsvetkov, Yulia, Anastasopoulos, Antonios

arXiv.org Artificial IntelligenceJul-7-2024

Language technologies should be judged on their usefulness in real-world use cases. An often overlooked aspect in natural language processing (NLP) research and evaluation is language variation in the form of non-standard dialects or language varieties (hereafter, varieties). Most NLP benchmarks are limited to standard language varieties. To fill this gap, we propose DIALECTBENCH, the first-ever large-scale benchmark for NLP on varieties, which aggregates an extensive set of task-varied variety datasets (10 text-level tasks covering 281 varieties). This allows for a comprehensive evaluation of NLP system performance on different language varieties. We provide substantial evidence of performance disparities between standard and non-standard language varieties, and we also identify language clusters with large performance divergence across tasks. We believe DIALECTBENCH provides a comprehensive view of the current state of NLP for language varieties and one step towards advancing it further. Code/data: https://github.com/ffaisal93/DialectBench

basque, computational linguistic, dataset, (16 more...)

arXiv.org Artificial Intelligence

2403.11009

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
South America > Brazil (0.04)
Asia > Bangladesh > Dhaka Division > Dhaka District > Dhaka (0.04)
(58 more...)

Genre: Research Report (0.81)

Industry:

Government > Regional Government > North America Government > United States Government (0.67)
Information Technology > Services (0.45)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.47)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (0.46)

Add feedback

ArzEn-ST: A Three-way Speech Translation Corpus for Code-Switched Egyptian Arabic - English

Hamed, Injy, Habash, Nizar, Abdennadher, Slim, Vu, Ngoc Thang

arXiv.org Artificial IntelligenceNov-21-2022

We present our work on collecting ArzEn-ST, a code-switched Egyptian Arabic - English Speech Translation Corpus. This corpus is an extension of the ArzEn speech corpus, which was collected through informal interviews with bilingual speakers. In this work, we collect translations in both directions, monolingual Egyptian Arabic and monolingual English, forming a three-way speech translation corpus. We make the translation guidelines and corpus publicly available. We also report results for baseline systems for machine translation and speech translation tasks. We believe this is a valuable resource that can motivate and facilitate further research studying the code-switching phenomenon from a linguistic perspective and can be used to train and evaluate NLP systems.

artificial intelligence, machine translation, natural language, (17 more...)

arXiv.org Artificial Intelligence

2211.12

Country:

Africa > Middle East > Egypt > Cairo Governorate > Cairo (0.04)
Asia > Middle East > Republic of Türkiye > Batman Province > Batman (0.04)
North America > United States > New York (0.04)
(5 more...)

Genre:

Research Report (0.50)
Overview (0.46)

Industry:

Media (0.68)
Leisure & Entertainment (0.68)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Add feedback

Learning Arabic from Egypt's Revolution

The New YorkerApr-11-2017, 03:10:05 GMT

When you move to another country as an adult, the language flows around you like a river. Perhaps a child can immediately abandon himself to the current, but most older people will begin by picking out the words and phrases that seem to matter most, which is what I did after my family moved to Cairo, in October of 2011. It was the first fall after the Arab Spring; Hosni Mubarak, the former President, had been forced to resign the previous February. Every weekday, my wife, Leslie, and I met with a tutor for two hours at a language school called Kalimat, where we studied Egyptian Arabic. At the end of each session, we made a vocabulary list. In early December, following the first round of the nation's parliamentary elections, which had been dominated by the Muslim Brotherhood, my language notebook read: On many days, I went to Tahrir Square, to report on the ongoing revolution. If I heard unfamiliar words or phrases, I brought them back to class. The following month, I learned "tear gas," "slaughter," and "Can you speak more slowly?" "Conspiracy theory" appeared in my notebook on the same day as "fried potatoes." Sometimes I wondered about the strangeness of Tahrir-speak, and what my Arabic would have been like if I had arrived ten years earlier. But it would have been different at any time, in any place: you can never step into the same language twice. Even eternal phrases took on a new texture in the light of the revolution. After I could understand some of the radio talk shows that cabbies played, I realized that callers and hosts exchanged Islamic greetings for a full half minute before settling down to heated arguments about the new regime. Our textbook was entitled "Dardasha"--"Chatter"--and it outlined set conversations that I soon carried out with neighbors, using phrases that would never be touched by Tahrir: "May peace, mercy, and the blessings of God be upon you." One of our teachers, Rifaat Amin, prepared a five-page handout entitled "Arabic Expressions of Social Etiquette." This supplemented "Dardasha," which also featured some lessons about social traditions, including the evil eye, the belief that envy can cause misfortune. In "Dardasha," icons of little bombs with burning fuses had been printed next to the kind of phrase that, even during a revolution, qualified as explosive: "Your son is really smart, Madame Fathiya."

arabic, artificial intelligence, rifaat, (17 more...)

The New Yorker

Country:

Africa > Middle East > Egypt > Cairo Governorate > Cairo (0.26)
Europe > Western Europe (0.04)
Africa > Middle East > Egypt > Eastern Desert > Southern Province (0.04)
(10 more...)

Industry:

Media (1.00)
Leisure & Entertainment (1.00)
Law Enforcement & Public Safety (1.00)
(4 more...)

Technology:

Information Technology > Artificial Intelligence (0.68)
Information Technology > Communications (0.46)

Add feedback